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http://www .jbsdonline.com 
Abstract 


In this study, two homology models (denoted as MP*ST and MPfSH) of main proteinase 
(MP) from the novel coronavirus associated with severe acute respiratory syndrome 
(SARS-CoV) were constructed based on the crystal structures of MP'® from transmissible 
gastroenteritis coronavirus (TGEV) (MPT) and human coronavirus HcoV-229E (MPH), 
respectively. Both MPtST and MPSH exhibit similar folds as their respective template 
proteins. These homology models reveal three distinct functional domains as well as an 
intervening loop connecting domains II and III as found in both template proteins. A cat- 
alytic cleft containing the substrate binding sites S1 and $2 between domains I and II are 
also observed. S2 undergoes more significant structural fluctuation than $1 during the 400 
ps molecular dynamics simulations because it is located at the open mouth of the catalytic 
cleft, while S1 is situated in the very bottom of this cleft. The thermal unfolding of these 
proteins begins at domain III, where the structure is least conserved among these proteins. 
MP" may still maintain its proteolytic activity while it is partially unfolded. The electro- 
static interaction between Arg40 and Asp186 plays an important role in maintaining the 
structural integrity of both SI and $2. 


Key words: Homology, Main Proteinase, Coronavirus, Severe acute respiratory syndrome 
(SARS), Substrate binding site, Molecular dynamics simulations. 


Introduction 


An outbreak of atypical pneumonia, designated as severe acute respiratory syn- 
drome (SARS), was first reported in Guangdong Province of China in late 2002 
and rapidly spread to several countries (1, 2). Infection of SARS is usually char- 
acterized by high fever, malaise, rigor, headache, nonproductive cough and may 
progress to generalized, interstitial infiltrates in the lung (3). The sequence of the 
complete genome of the coronavirus associated with SARS (SARS-CoV) has been 
determined and characterized with two different isolates (4, 5). Phylogenetic 
analyses and sequence comparisons have further revealed that SARS-CoV is not 
closely related to any of the three groups of coronaviruses. 


Coronaviruses belong to a diverse group of positive-stranded RNA viruses featur- 
ing the largest viral RNA genomes. They share a similar genome organization and 
common transcriptional and translational processes as Arteriviridae (6, 7). The 


Abbreviations: 3CL?°: 3C-like proteinase; 3D: Three-dimensional; ASA: Accessible surface area; 
CVFF: Consistent valence force field; DSSP: Dictionary of secondary structure pattern; MD: Molecular 
dynamics; MP®: Main proteinase; MPH: Main proteinase of human coronavirus HeoV-229E; MP®S: 
Main proteinase of SARS-CoV; MP®SH: Homology model of MP®S based on the crystal structure of 
MPH; MP®ST: Homology model of MP®S based on the crystal structure of MPT; MPT: Main pro- 
teinase of TGEV; PCB: Periodic boundary condition; RMSD: Root-mean-square deviation; SARS: 
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human coronavirus HcoV-229E replicase gene encodes two overlapping polypro- 
teins, ppla and pplab (8), that mediate all the functions required for viral replica- 
tion and transcription (9). The functional polypeptides are released from the 
polyproteins by extensive proteolytic processing, which is primarily achieved by 
the 33.1-kDa HCoV-229E main proteinase (MP'°) (10). MP'° is commonly also 
called 3C-like proteinase (3CLP™) to indicate a similarity of its cleavage site speci- 
ficity to that observed for picornavirus 3C proteinase (3CP") and the identification 
of a Cys residue as the principle nucleophile in the context of a predicted two-p- 
barrel fold (11, 12). MP'° from HcoV-229E (MP'°H) has been biosynthesized in 
Escherichia coli and the enzyme properties, inhibitor profile, and substrate speci- 
ficity of the purified protein have been well characterized (10, 13). 


Recently, the crystal structures of MPT°H (14) and MP'° from porcine coronavirus 
(transmissible gastroenteritis virus, TGEV) (MP'T) complexed with its inhibitor 
(15) have been determined. In addition, homology models of MP'°S based on the 
crystal structures of MP'H (14) and MP'°T (16, 17) have been also constructed. 
Comparison of these structures reveals a remarkable degree of conservation of the 
substrate binding sites, which is further supported by the cleavage of the substrate 
for MPT°T with the recombinant MP'S (14). In addition, MP'°S exhibits 40 and 
44% sequence identity to MPT°H and MP'°T, respectively (14). 


Molecular dynamics (MD) simulations in the atomic level have been intensively 
preformed to gain insight into protein unfolding from its native state induced by 
raising the temperature (18-20), changing the solvent (21) or increasing the pres- 
sure (22). Usually, temperatures in the range of 400 to 600K are employed. 
According to the Arrhenius equation, the unfolding rate is expected to be approxi- 
mately 103-, 10°-, 109-folds faster than it is observed experimentally when the tem- 
perature is increased by 100, 200, and 300 °C, respectively (23). Daggett and 
Levitt (24) have shown that the results obtained from the MD simulations of pro- 
tein unfolding induced by increasing the temperature should be reliable by com- 
paring their results to the pH induced denaturation of barnase (25). Previously, sev- 
eral MD simulations, homology modeling, and molecular docking experiments 
have been successfully conducted towards various target proteins in our group (26- 
35). In this paper, two homology models of MP'° from SARS-CoV (MP'°S) were 
constructed (denoted as MP'°ST and MP'°SH) based on the crystal structures of 
MP'°T (15) and MP'°H (14), respectively. Subsequently, MD simulations associat- 
ed with temperature jump technique were conducted to investigate the structure 
variations of these proteins. Beyond the continued characterization of MP'° from 
various coronaviruses, the amino acid sequence alignment, structural homology 
analyses, and MD simulations of MP'°S presented in this study shall provide par- 
ticularly attractive targets for further structure-based design of anti-SARS drugs. 


Methods 
Model Proteins 


Two homology models of MP'S (MP'ST and MP'°SH) were constructed based 
on the monomer of the three-dimensional (3D) structure of MP'°T refined to 1.96 
A resolution (15) (Fig. 1A) and MP'°H solved at 2.54 A resolution (14) (Fig. 1B), 
which were obtained from the protein data bank (PDB; accession numbers Ilvo 
and 1p9u, respectively). The inhibitor, a substrate analog hexapeptidyl 
chloromethyl ketone, was removed from the crystal structure of MPT before 
being used as the template. Unfavorable nonphysical contacts in these structures 
were then eliminated using Biopolymer module of Insight II program (Accelyrs, 
San Diego, CA, USA) with the force field Discover CVFF (consistent valence 
force field) (36-38) in the SGI 0200 workstation with 64-bit HIPS RISC R12000 
2 x 270 MHz CPU and PMC-Sierra RM7000A 350MHz processor (Silicon 
Graphics, Inc., Mountain View, CA, USA), followed by 10,000 energy mini- 


Downloaded by [University of Connecticut] at 13:22 12 October 2014 


67 
MD Simulations of 
Coronavirus Mpro 


Figure 1: The crystal structure of (A) MPT (15) and 
(B) MPH (14) and the homology model of (C) MeST 
and (D) MPSH. These structures are visualized by 
Insight II program. The N- and C-termini are indicated. 
Secondary structure elements are labeled as in Table I. 
o-Helices are shown in red cylinders, while f-strands are 
illustrated in arrows pointing from N- to C-terminus. The 
polypeptide backbones belonging to the tun and random 
coil regions are shown in blue and green, respectively. 
‘The general acid-base catalyst His residue and the nucle- 
Cys residue are labeled. ‘The locations of the 
ibstrate binding sites $1 and $2 are indicated. 


Figure 2: Amino acid sequence alignment of MPT, 
MPH, and MPS. Secondary structures as defined in 
the crystallographic structure of MPT (15) are shown 
on top. The start and end amino acid residues are num- 
bered in the brackets on the left and right of each 
sequence, respectively. Residues totally conserved in 
all sequences are indicated in red letters with green 
background. Residues conserved in MPT and MPH 
but different from those in MP®S are represented in 
black letters with yellow background, Residues where 
variations occur are given in blue or brown letters with 
grey background. The amino acid residues missing in 
both MPT and MPH are shown as dashed lin 
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Tablel 
The amino acid sequence identities among M?°H, MT, and MP°S, 
Identity (%) 
Total Domain] Domain II Domain III 
MPH and MT. 60.80 63.44 65.06 35.45 
‘M?"H and M™"S 40.19 41.94 45.78 35.64 
MT and M?°S 43.85 44.09 49.40 39.22 


mization calculations using steepest descent method, to yield the model proteins 
for further structure building. 


Structural Homology 


Homology utilizes structure and sequence similarities for predicting unknown pro- 
tein structures. The Homology module in Insight II allows us to build the 3D mod- 
els of the target protein (i.c., MP*°S) using both its amino acid sequence and the 
structures of known, related template proteins (i.c., MP'°H and MPT). The 
Homology program provides simultaneous optimization of both structure and 
sequence homologies for multiple proteins in a 3D graphics environment, based on 
a method developed by Greer (39). Amino acid sequences of MPf°H (Accession 
Q05002) (40), MPt°T (NC_002306.2) (41), and MProS (NC_004718.3) (4) were 
obtained either from Swiss-Prot or NCBI database. Smith-Waterman pairwise 
amino acid sequence alignments were performed based on the conserved structur- 
al features among MP" from various coronaviruses to find the location of the active 
site and substrate binding sites S1 and S2 of MPt©°S. The consensus structural con- 
served regions (SCRs) of MPS were generated from alignments of the target pro- 
tein to the template proteins. The atomic coordinates were then transferred from 
the template proteins to MP'°S in each SCR using Mutation Matrix module of the 
Insight II program. Automatic loop building was performed either by database 
searching (42) or generation through random conformational search (43). The 
coordinates at the N- and C-termini of these loops were then automatically 
assigned. Side chains of MP"S were automatically replaced, preserving the con- 
formations of the template proteins. The side chain conformations were optimized 
either manually or automatically using a rotamer library (44). Secondary structure 
motifs were identified by database searching and defined by DSSP (45). The bond 
lengths and torsion angles in the SCRs and loop regions were repaired and relaxed 
using Homology/Refine/SpliceRepair and Homology/Refine/Relax, respectively. 
The newly built structures of MP'°S were substantially refined to avoid van der 
Waals radius overlapping, unfavorable atomic distances, and undesirable torsion 
angles using molecular mechanics and dynamics features in Discover module. 


Molecular Dynamics Simulations 


The crystal structures of MPf°CH and MP'°T and the homology models of MP'°SH 
and MPf°ST were subjected to energy minimization calculations by steepest 
descent method with 3,000 iterations followed by Newton-Raphson method with 
5,000 iterations to be used as the initial energy-minimized structures for further 
structural comparison. Each energy-minimized structure was subsequently placed 
in the center of a lattice with the size of 50 x 60 x 85 A3 full of 6,222, 5,866, 5,836, 
and 5,776 water molecules for the system of MP°°H, MPYOT, MPOSH, and MP°ST, 
respectively. These systems composed of the protein and water molecules were 
then equilibrated by performing 20,000 steepest descent minimization and 10 ps 
dynamics calculations. The explicit image periodic boundary condition (PBC) was 
used for solvent equilibrium. At the end of explicit image equilibrium, Discover 
will re-image molecule whose center of mass has moved out of the lattice in order 
to maintain the integrity of the lattice with a relatively constant density. Finally, 
400 ps MD simulation was carried out for each system using the Discover module 
of Insight II. The temperature and pressure were maintained for each MD simula- 
tion by weak coupling the system to a heat bath at 300, 400, and 600 K and an 
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external pressure bath at one atmosphere with a coupling constant of 0.5 ps, accord- 
ing to the method described by Berendsen et al. (46). Cut-off radius of 13 A for 
the non-bonded interactions was applied to each MD simulation. The time-step of 
the MD simulations was 1 fs. The trajectories and coordinates of these structures 
were recorded every 2 ps for further analyses. 


Structural Analyses 


Although some complicated algorithms have been proposed to measure the struc- 
tural similarity between proteins (47, 48), the root-mean-square deviation (RMSD) 
remains the simplest one for closely related proteins (49). For each MD simula- 
tion, the RMSDs of the trajectories recorded every 2 ps interval were calculated for 
the backbone C,, atom of the entire protein, the substrate binding sites S1 and S2, 
and domains I, II, and III during the course of 400 ps MD simulations with refer- 
ence to the respective starting structure according to Koehi (50). The RMSDs were 
calculated after optimal superimposition of the coordinates to remove translational 
and rotational motion (51). Secondary structures were assigned based on DSSP 
(45), in which pattern recognition of hydrogen bond was correlated to the geomet- 
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Figure 3: The RMSDs of the backbone C,, for (A) the 
entire protein, (B) substrate binding site S1, (C) sub- 
strate binding site $1, (D) domain I, (E) domain II, and 
(F) domain I of MPT, MPH, MpeST, and MSH 
with reference to their respective starting structure dur- 
ing the 400 ps MD simulations at 300, 400, and 600 K. 
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Figure 4: Secondary structures predicted according to DSSP (45) as a function of MD simulation time 
for (A) MPT, (B) MPH, (C) MPST, and (D) MPSH. a-Helix, B-sheet, turn, and coil are shown in, 
Liu et al red, light yellow, blue, and green, respectively. 
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tical features. The default hydrogen bonding energy criterion of -0.5 kcal/mol was 
used. Accessible surface areas (ASAs) of the substrate binding sites $1 and S2 and 
the distances between the sulfur atom of the nucleophilic Cys residue and the N&2 
of the general acid-base catalyst His residue and between the C® atom of the total- 
ly conserved Arg40 from S2 and the CY atom of the totally conserved Asp186 from 
the extended loop connecting domains II and III (numbered as in MP'°T) for each 
structure were also recorded as a function of MD simulation time. The average sec- 
ondary structure content was defined as the ratio of the number of the residual H 
bonds at time t to the number of the total H bonds in the starting structure. 


Results and Discussion 
The Homology Models of MP'°ST and MP'°SH 


Usually, an optimal amino acid sequence alignment based on the conserved struc- 
tural regions is essential to the success of homology modeling. The results of pair- 
wise amino acid sequence alignment of MP'°T, MPTCH, and MP'°S are given in 
Figure 2. There are 301, 300, and 306 residues in MPTT, MP'CH, and MP'S, respec- 
tively. The residue corresponding to Ala46 in domain I of MP'°S and those corre- 
sponding to Asp248, Ile249, and Gln273 in domain III of MP'S are missing in both 
MP'°T and MP!°H. In addition, there are one and two extra residues at the C-termi- 
nus of MPT°S comparing to MP'°T and MP'°H, respectively. Both the general acid- 
base catalyst and the nucleophile residue of these three proteins are totally con- 
served, with the general acid-base catalyst His41 located in a highly conserved sig- 
nature sequence (LNGLWLXDXVXCPRHVI) of domain I and the nucleophilic 
Cys144 for MP!T and MP'°H or Cys145 for MP'S in the highly conserved signa- 
ture sequence (TIXGSFXXGXCGSXG) of domain II (i.e., Xs indicate the noncon- 
served residues). The results of amino acid sequence identity among these three pro- 
teins are summarized in Table I. MP'°T and MP'°H show the highest amino acid 
identity (60.80 %), whereas MPTCH and MP"°S exhibit the lowest amino acid identi- 
ty (40.19 %). MPS shows slightly higher amino acid identity to MP'°T than MP'°H, 
indicating that the structure of MP'S may be more similar to MPT than MPOH. 
Comparing the three domains among these three proteins, domain II has the highest 
amino acid identity, whereas domain III shows the lowest amino acid identity. 


The level of similarity between MP'°S and MP'°T as well as between MP'S and 
MP'H allowed us to construct two homology models for MP'S (denoted as 
MP'ST and MP'SH) by comparative approach and the results are illustrated in 
Figure 1C and D. The quality of the geometry and of the stereochemistry of the 
protein structures was validated using Homology/ProStat/Struct_Check commend 
of Insight II program. A total of 97 and 96% of the backbone dihedral angle (@ and 
) densities are located within the structurally favorable regions in Ramachandran 
plot for MP'ST and MP'°SH, respectively (data not shown). The calculation of 
main chain torsion angles (x, and x) of these proteins showed no severe distorsion 
of the backbone geometry. In addition, all bond lengths and angles for both homol- 
ogy models are located within the reasonable regions. Besides, the homology mod- 
els of MP'°ST and MP'°SH constructed in this work are very similar to the 3D mod- 
els proposed by Lee et al. (16) and Aland et al. (14), respectively. The above evi- 
dences indicate that the quality of these homology models should be reliable. 


The results of homology modeling show that both MP'°ST and MP'°SH exhibit three 
distinct domains, indicating that they adopt similar folds as MPT and MPT°H, 
respectively. However, the secondary structures of both MP™ST and MP'°SH pre- 
dicted according to DSSP (45) are less conserved comparing to those of MP'°T (Fig. 
1A) and MP'H (Fig. 1B), particularly in domain III. It is consistent with the results 
of amino acid sequence alignment, showing that domain III exhibits the least 
sequence identity comparing to domains I and II among these proteins. Instead of 
separating domains I and II with a catalytic cleft, domains II and III are loosely con- 
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Figure 5: The ASAs of the substrate binding sites S1 

and S2 at (A) 300, (B) 400, and (C) 600 K as a function 

of MD simulation time for MPT, MP®H, MP®ST, and 

MPrsH. 


nected by a long loop (residues 184-199 in both MP'°T and MP'°H and residues 185- 
200 in MP?°S) in all structures. Although showing the least structural identity, 
domain III, a globular cluster of 5, 5,4, and 2 helices for MPt°T, MPfOH, MP!ST, 
and MPYSH, respectively (Fig. 1), has been implicated in the proteolytic activity of 
Mpro (13). Comparing the two crystal structures, MP*°T and MP'°H, and the two 
homology models, MPf°ST and MP'OSH, we found that domain I of MP?°S is more 
similar to that of MP®°H, while domains II and III of MP'°S are more similar to those 
of MPYOT. The low sequence identity and secondary structure similarity in domain 
II among these proteins presented in the present study, as well as the previous find- 
ings showing that the characterization of recombinant proteins, in which 33, 28, and 
34 C-terminal amino acid residues of MP'° from IBV, MHV, and HCoV, respective- 
ly, were deleted resulted in dramatic losses of proteolytic activity, suggest that 
domain III may play a minor role in proteolytic activity through an undefined mech- 
anism (13). The putative substrate binding sites S1 and S2 of MP!°ST and MPr°oSH 
are also located in a catalytic cleft between domains I and II (Fig. 1C and D), which 
are nearly identical to those of MP'°T and MPT°H (Fig. 1A and B). It indicates that 
MP'°§ may follow the similar substrate binding mechanisms of MPT and MPfoH, 
allowing us to design anti-SARS drugs by screening the known proteinase 
inhibitors. A good example has been given by Anand et al. (14). They proposed a 
3D structural model of MP'°S based on the crystal structure of MPH and further 
recommended the use a rhinovirus inhibitor (codename AG7088), which is already 
in clinical trials as anti-common cold drug, as the potential model compound for the 
design of anti-SARS drugs. In addition, Lee et al. (16) have docked 16 available 
antiviral drugs from the NCI database to the structural model of MP'°S and detect- 
ed that four of them with trade-names Nevirapine, Glycovir, Virazole, and 
Calanolide A fit well at the substrate binding cleft of there 3D model of MP'°S. 


Molecular Dynamics Simulations 


The structural changes of the whole protein, substrate binding sites $1 and $2, and 
domains I, II and IIT for MPfOT, MPfCH, MPf°ST, and MP'OSH were evaluated by 
plotting the main-chain C, RMSDs at different temperatures as a function of run- 
ning time and the results are shown in Figure 3A-F, respectively. At 300 K, the 
overall RMSDs for these proteins all converged below 3 A, which is in good agree- 
ment with the results from previous MD simulations (16). In addition, the increas- 
es of the overall RMSDs for these proteins at 400 and 600 K followed the similar 
pattern, except for MPf°H, whose overall RMSD reached 9 A at 600 K; whereas 
those of the other three proteins reached 6 A only. It indicates that MP?°H may 
undergo an overall structural change more dramatically at high temperature. By 
comparing the RMSDs of the substrate binding sites S1 and S2 at various temper- 
atures (Fig. 3B and C), we found that S1 exhibits higher structural integrity than 
S2. It is attributed to that $2 is located on the open mouth of the catalytic cleft 
between domains I and II and is fully solvent-exposure, whereas S1 is situated in 
the very bottom of this cleft and is well protected from the hydrophobic core. The 
higher structural variation of S2 makes it flexible enough to accommodate a bulky 
hydrophobic residue from the substrate. Furthermore, S2 of MPfOH undergoes a 
more dramatic structural change at higher temperatures than S2 of the other pro- 
teins, indicating that MP*°H may lose its binding affinity towards various substrates 
or inhibitors more easily than the other three MPr°. 


Comparing the RMSD values in Figure 3D-F, we found that domains I and II of 
MPT, MPfOH, MPYOST, and MP'°SH follow the similar dynamics behaviors; where- 
as domain III of these proteins shows different structural variations during the entire 
simulation time courses. This result is in good agreement with results of amino acid 
sequence alignment and homology modeling, showing that domain III of these pro- 
teins exhibit least structural similarity among these three domains. The secondary 
structure propensity of these proteins was predicted according to DSSP (45) during 
the entire MD courses at various temperatures and the results are shown in Figure 4. 
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The values of the average secondary structure content for each secondary structure 
element in these proteins are summarized in Table II. As expected, it is faster for 
domain IIT to lose its helical content than for domains I and II to lose their sheet con- 
tent in all cases. The high dielectric constant of the explicit water system may 
increase the opportunity of hydrogen bonding between amide protons and sur- 
rounding solvent molecules and simultaneously promotes the intermolecular hydro- 
gen bonding and therefore destabilizes the structural integrity of these helices in 
domain III. From the analyses of the average secondary structure contents (Table 
ID) and the secondary structure propensities during the MD time courses (Fig. 4), we 
estimated that the thermal unfolding of the helices in domain III of both MPT and 
MPH follows the order of CIII—-EII—BII—DHUI—AII. Helix ATI is mainly 
composed of nonpolar residues and forms an interior hydrophobic core in domain 
II, which is in turn restricted to solvent exposure and thus maintains higher helical 
content than the other helices. The ASA for each residue in helix ATII is nearly zero 
(data not shown), again indicating that the hydrophobic environment around helix 
AIII may protect it from forming intermolecular hydrogen bonding with water mol- 
ecules. Furthermore, the result of amino acid sequence alignment shows that helix 
AIII exhibits higher sequence identity than the other helices in domain III among 
these proteins, which may also emphasize the importance of helix AIII in maintain- 
ing the structural integrity of the globular domain III in MP'°. 
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Table II 
Average secondary structure content for each secondary structure element in MT, M?°H, M®°ST, and MSH. 
Secondary ‘Average secondary structure content (%) 
structure MT MH MST MP°SH 
element [300K 400K 600K | 300K 400K 600K | 300K 400K 600K | 300K 400K 600K 
al 75 65 10 35 B 7 60 B B 99 90 20 
bl 2 64 14 64 18 9 93 85 62 95 84 22 
cl 35 67 26 85 48 18 65 62 53 90 15 21 
AL 3 16 2 3 3 0 - - - 90 85 12 
dt 59 61 8 40 53 8 - - - 68 46 3 
el 33 45 5 50 41 5 - - - 91 74 23 
fl 1 76 20 83 2 12 50 47 25 55 65 8 
all 60 50 15 61 58 7 37 21 1 56 46 5 
bit 45 36 10 34 53 10 36 19 2 86 35 1 
cll 45 39 22 44 39 7 33 15 14 87 16 5 
alt 42 46 19 60 45 10 88 16 52 82 60 5 
ell 49 37 18 65 34 3 86 16 53 81 44 8 
fm 22 20 3 22 5 1 52 48 19 49 8 5 
All 85 56 13 69 63 14 56 44 34 61 42 8 
BIL 8 45 5 7 39 6 - - - - - - 
cur 37 13 1 1 0 0 RB 36 6 - - - 
DiI 93 63 9 96 2 9 - - - - - - 
EI 92 16 3 35 33 3 90 59 22 66 58 4 


In contrast to the specific unfolding order of the helices in domain III, there is no 
particular unfolding order of the sheets in domains I and II (Fig. 4 and Table II). The 
packing of the sheets in domains I and II is similar to a sandwich and the catalytic 
cleft is located in the middle of this well organized structure. The nucleophilic 
Cys144 is located in the center of this catalytic cleft and some of the residues form- 
ing the substrate binding site S1 is distributed in some of the sheets in domains I and 
IL. Thus, in order to maintain the proteolytic activity, these sheets have to preserve 
their secondary structural integrity. Most of the structural variations in domains I 
and II at high temperatures are resulted from the fluctuation of outer loops, which 
are fully exposed to the solvent. Previous study has shown that the region around 
residues 10-20 (corresponding to sheet bI in domain I) is relatively rigid and the 
region around residues 265-287 (corresponding to the loop connecting helices DIII 
and EIII in domain III) is relatively flexible than the other regions of MP'ST (16). 
The present results also indicate that the structural network formed by the sheets in 
domains I and II is relatively stable during the MD simulation courses comparing to 
the network formed by the helices in domain III. A short helix AI is observed in the 
outer surfaces of domain I in the crystal structures of MP!°T and MP'°H (Fig. 1A and 
B), whereas this helix is missing in the homology models of MP'°ST and MP™SH 
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Figure 7: The distance between the sulfur atom of the 
nucleophilic Cys residue and the N® of the general 
acid-base catalyst His residue as a function of MD sim- 
ulation time for MPT, MPH, MP®ST, and MP®SH at 
300, 400, and 600 K. The snapshots of MPT taken at 
82 ps, 400 K and at 100 ps, 600 K are shown in Figure 
8A and B, respectively. 


Figure 6: The distance between the C® atom 
of the totally conserved Arg40 from the sub- 
strate binding site S2 and the CY atom of the 
totally conserved Asp186 from the interven- 
ing loop connecting domains 1 and II 
(residue numbered as in MPT) as a function 
of MD simulation time for MPT, MPH, 
MP®ST, and MPmoSH at 300, 400, and 600 K. 


distance (A) 


° 100 200 300 400 
Time (ps) 


(Fig. 1C and D). During the MD simulation courses, this helix disappeared very 
quickly due to the contact with the surrounding water molecules. 


It has been shown previously that, similarly to 3CP'° (52-54), specific substrate 
binding by MP'° is ensured by the well-defined S1 and S2 substrate binding pockets 
(15). In addition, it has also been shown that the imidazole side chain of the con- 
served His residue, which is located in the center of a hydrophobic pocket, interacts 
with the P1 carboxamide side chain of the substrate. This specific interaction is gen- 
erally considered to determine the piconavirus 3CP'° specificity for Gln residue at 
P1 (52-54). The totally conserved His162 of both MPT°T and MP'°H or His163 of 
MP"°S is located at the very bottom of this hydrophobic pocket, which is formed by 
the totally conserved residues Phe 139 of both MPT and MP'°H or Phe140 of MPt°S 
and the main-chain atoms of Ile140, Leul64, Glu165, and His171 of MPT, Ile140, 
Tle164, Glul165, His171 of MPTCH, or Leul41, Met165, Glul66, and His172 of 
MPS. The totally conserved Glul65 of MPT and MPH or Glul66 of MPt°S 
forms an ion pair with the totally conserved His171 of MP'°T and MP'°H or His172 
of MPS (15). This salt bridge is itself on the periphery of these molecules, form- 
ing part of the outer wall of the substrate binding site S1. Figure 5 shows the ASAs 
of the substrate binding sites S1 and S2 of MP'°T, MP'CH, MP'ST, and MP'°SH at 
various temperatures. In general, S2 exhibits higher ASAs than S1 during the MD 
simulation courses. In addition, S1 was found to maintain its structural integrity, 
whereas S2 exhibits more structural variations during the MD simulations. It is 
attributed to that S2 is located on the open mouth of the catalytic cleft between 
domains I and II and thus is fully exposed to the surrounding solvent, whereas S1 is 
situated in the very bottom of this cleft and is subsequently protected by the 
hydrophobic core. The higher structural variation of S2 is necessary for the prote- 
olytic activity of MP'° because it is flexible enough to accommodate a bulky 
hydrophobic residue from the substrate and further allows the substrate to form 
close contact with the substrate binding cleft formed by S1 and S2 of this enzyme. 


Previous study has indicated that the loop connecting domains II and III (residues 
184-199) plays an important role in maintaining the proteolytic activity of MP°°T 
(15). This intervening loop is located in adjacent to the substrate binding site S2. 
Moreover, the totally conserved Arg40 from S2 forms an electrostatic interaction 


Downloaded by [University of Connecticut] at 13:22 12 October 2014 


with the totally conserved Asp186 from this extended loop (15). In order to inves- 
tigate the importance of this electrostatic interaction in maintaining the structural 
integrity of S2 during the MD simulations, the distance between the C® atom of 
Arg40 and the CY atom of Asp186 (residues numbered as in MP'°T) for each protein 
was measured and the results are shown in Figure 6. In addition, the distance 
between the S atom of the nucleophilic Cys residue and the N&? of the general acid- 
base catalyst His residue for each structure was also recorded as a function of MD 
simulation time and the results are given in Figure 7. At higher temperatures, these 
distances all increase significantly, indicating that the electrostatic interaction 
formed by Arg40 and Asp186 and the catalytic activity formed by the Cys-His pair 
are destroyed towards heating. Interestingly, the distance between Cys-His pair 
increases dramatically for MP!°T between 75 and 90 ps at 400 K. In order to com- 
pare the partially unfolded structure with the totally unfolded one, the snapshots of 
MP'°T at 82 ps, 400 K (the distances between Arg40 and Asp186 and between His41 
and Cys144 are 7.79 and 9.04 A, respectively) and at 100 ps, 600 K (the distances 
between Arg40 and Asp186 and between His41 and Cys144 are 11.18 and 16.08 A, 
respectively) were generated as in Figure 8A and B, respectively. According to these 
snapshots, the former structure still maintains most of its secondary structural 
integrity, whereas most of the secondary structures disappear in the later one. In 
addition, the substrate binding sites S1 and S2 still maintain their structural integri- 
ty in the partially unfolded MPT, whereas these two binding sites are shifted and 
destroyed in the totally unfolded MP'°T, It indicates that the electrostatic interaction 
between Arg40 and Asp186 plays an important role in maintaining the packing of 
S1 and S2, thus preserving the proteolytic activity of this enzyme (15). From the 
above results, we may conclude that MPT may still maintain its proteolytic activi- 
ty while it is partially unfolded and that the electrostatic interaction between Arg40 
and Asp186 functions as a gate controlling the open and close states of the substrate 
binding sites $2. Previous MD simulations have shown that MP'°ST complexed 
with inhibitor is, in average, less flexible than the free enzyme either in the 
monomeric or dimeric form (16). Our simulation results also indicates that water 
molecules may enter $2 and further penetrate S1 without the protection from the 
bound inhibitor when the electrostatic interaction between Arg40 and Asp186 is 
destroyed at high temperatures, resulting in the distortion and destroy of the pack- 
ing of these two sites, which are mainly lined up by hydrophobic residues 
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Figure 8: The snapshots of MT taken at (A) 82 ps, 
400 K and (B) 100 ps, 600 K. Secondary structures pre- 
dicted according to DSSP (43) are shown as in Figure | 

Substrate binding sites S1 and S2 are represented as 
CPK and colored in indigo and brown, respectively. 
‘The nucleophilic Cys144 and the general acid-base cat- 
alyst His41 are shown in purple as sticks. The totally 
conserved residues Arg40 and Asp186 forming the elec- 
trostatic interaction in the native structure of MPT are 
shown in grey as 
by Insight II program, 


ick. These structures are visualized 
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In conclusion, the homology models of MP"°S were successfully constructed based 
on the crystal structures of both MPtOT (15) and MPt°H (14) by comparative 
approach. Both MP'°ST and MP'oSH exhibit similar folds as their respective tem- 
plates MP'°T and MPr°H. Three distinct functional domains as well as an inter- 
vening loop from residues 184 to 199 connecting domains II and III are also 
obtained in these homology models as in the template proteins. A catalytic cleft 
containing the substrate binding sites S1 and S2 between domains I and II are also 
observed in these homology models. S2 undergoes more dramatic structural 
changes than S1 because it is located at the open mouth of the catalytic cleft and is 
fully exposed to the solvent, whereas S1 is situated in the very bottom of this cleft 
and is well protected from the hydrophobic core. The unfolding of these proteins 
begins at domain III, where the structure is least conserved among these proteins. 
Mpro may still maintain its proteolytic activity while it is partially unfolded. The 
electrostatic interaction between the totally conserved Arg40 from the substrate 
binding site $2 and the totally conserved Asp186 from the intervening loop 
between domains II and III (residues numbered as in MPT°T) plays an important 
role in maintaining the structural integrity of both $1 and S2. 
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